{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecasting, updating datasets, and the \"news\"\n",
    "\n",
    "In this notebook, we describe how to use Statsmodels to compute the impacts of updated or revised datasets on out-of-sample forecasts or in-sample estimates of missing data. We follow the approach of the \"Nowcasting\" literature (see references at the end), by using a state space model to compute the \"news\" and impacts of incoming data.\n",
    "\n",
    "**Note**: this notebook applies to Statsmodels v0.12+. In addition, it only applies to the state space models or related classes, which are: `sm.tsa.statespace.ExponentialSmoothing`, `sm.tsa.arima.ARIMA`, `sm.tsa.SARIMAX`, `sm.tsa.UnobservedComponents`, `sm.tsa.VARMAX`, and `sm.tsa.DynamicFactor`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import statsmodels.api as sm\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "macrodata = sm.datasets.macrodata.load_pandas().data\n",
    "macrodata.index = pd.period_range('1959Q1', '2009Q3', freq='Q')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Forecasting exercises often start with a fixed set of historical data that is used for model selection and parameter estimation. Then, the fitted selected model (or models) can be used to create out-of-sample forecasts. Most of the time, this is not the end of the story. As new data comes in, you may need to evaluate your forecast errors, possibly update your models, and create updated out-of-sample forecasts. This is sometimes called a \"real-time\" forecasting exercise (by contrast, a pseudo real-time exercise is one in which you simulate this procedure).\n",
    "\n",
    "If all that matters is minimizing some loss function based on forecast errors (like MSE), then when new data comes in you may just want to completely redo model selection, parameter estimation and out-of-sample forecasting, using the updated datapoints. If you do this, your new forecasts will have changed for two reasons:\n",
    "\n",
    "1. You have received new data that gives you new information\n",
    "2. Your forecasting model or the estimated parameters are different\n",
    "\n",
    "In this notebook, we focus on methods for isolating the first effect. The way we do this comes from the so-called \"nowcasting\" literature, and in particular Bańbura, Giannone, and Reichlin (2011), Bańbura and Modugno (2014), and Bańbura et al. (2014). They describe this exercise as computing the \"**news**\", and we follow them in using this language in Statsmodels.\n",
    "\n",
    "These methods are perhaps most useful with multivariate models, since there multiple variables may update at the same time, and it is not immediately obvious what forecast change was created by what updated variable. However, they can still be useful for thinking about forecast revisions in univariate models. We will therefore start with the simpler univariate case to explain how things work, and then move to the multivariate case afterwards."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note on revisions**: the framework that we are using is designed to decompose changes to forecasts from newly observed datapoints. It can also take into account *revisions* to previously published datapoints, but it does not decompose them separately. Instead, it only shows the aggregate effect of \"revisions\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note on `exog` data**: the framework that we are using only decomposes changes to forecasts from newly observed datapoints for *modeled* variables. These are the \"left-hand-side\" variables that in Statsmodels are given in the `endog` arguments. This framework does not decompose or account for changes to unmodeled \"right-hand-side\" variables, like those included in the `exog` argument."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple univariate example: AR(1)\n",
    "\n",
    "We will begin with a simple autoregressive model, an AR(1):\n",
    "\n",
    "$$y_t = \\phi y_{t-1} + \\varepsilon_t$$\n",
    "\n",
    "- The parameter $\\phi$ captures the persistence of the series\n",
    "\n",
    "We will use this model to forecast inflation.\n",
    "\n",
    "To make it simpler to describe the forecast updates in this notebook, we will work with inflation data that has been de-meaned, but it is straightforward in practice to augment the model with a mean term.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# De-mean the inflation series\n",
    "y = macrodata['infl'] - macrodata['infl'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 1: fitting the model on the available dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we'll simulate an out-of-sample exercise, by constructing and fitting our model using all of the data except the last five observations. We'll assume that we haven't observed these values yet, and then in subsequent steps we'll add them back into the analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pre = y.iloc[:-5]\n",
    "y_pre.plot(figsize=(15, 3), title='Inflation');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To construct forecasts, we first estimate the parameters of the model. This returns a results object that we will be able to use produce forecasts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mod_pre = sm.tsa.arima.ARIMA(y_pre, order=(1, 0, 0), trend='n')\n",
    "res_pre = mod_pre.fit()\n",
    "print(res_pre.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Creating the forecasts from the results object `res` is easy - you can just call the `forecast` method with the number of forecasts you want to construct. In this case, we'll construct four out-of-sample forecasts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the forecasts\n",
    "forecasts_pre = res_pre.forecast(4)\n",
    "\n",
    "# Plot the last 3 years of data and the four out-of-sample forecasts\n",
    "y_pre.iloc[-12:].plot(figsize=(15, 3), label='Data', legend=True)\n",
    "forecasts_pre.plot(label='Forecast', legend=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the AR(1) model, it is also easy to manually construct the forecasts. Denoting the last observed variable as $y_T$ and the $h$-step-ahead forecast as $y_{T+h|T}$, we have:\n",
    "\n",
    "$$y_{T+h|T} = \\hat \\phi^h y_T$$\n",
    "\n",
    "Where $\\hat \\phi$ is our estimated value for the AR(1) coefficient. From the summary output above, we can see that this is the first parameter of the model, which we can access from the `params` attribute of the results object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the estimated AR(1) coefficient\n",
    "phi_hat = res_pre.params[0]\n",
    "\n",
    "# Get the last observed value of the variable\n",
    "y_T = y_pre.iloc[-1]\n",
    "\n",
    "# Directly compute the forecasts at the horizons h=1,2,3,4\n",
    "manual_forecasts = pd.Series([phi_hat * y_T, phi_hat**2 * y_T,\n",
    "                              phi_hat**3 * y_T, phi_hat**4 * y_T],\n",
    "                             index=forecasts_pre.index)\n",
    "\n",
    "# We'll print the two to double-check that they're the same\n",
    "print(pd.concat([forecasts_pre, manual_forecasts], axis=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 2: computing the \"news\" from a new observation\n",
    "\n",
    "Suppose that time has passed, and we have now received another observation. Our dataset is now larger, and we can evaluate our forecast error and produce updated forecasts for the subsequent quarters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the next observation after the \"pre\" dataset\n",
    "y_update = y.iloc[-5:-4]\n",
    "\n",
    "# Print the forecast error\n",
    "print('Forecast error: %.2f' % (y_update.iloc[0] - forecasts_pre.iloc[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To compute forecasts based on our updated dataset, we will create an updated results object `res_post` using the `append` method, to append on our new observation to the previous dataset.\n",
    "\n",
    "Note that by default, the `append` method does not re-estimate the parameters of the model. This is exactly what we want here, since we want to isolate the effect on the forecasts of the new information only."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a new results object by passing the new observations to the `append` method\n",
    "res_post = res_pre.append(y_update)\n",
    "\n",
    "# Since we now know the value for 2008Q3, we will only use `res_post` to\n",
    "# produce forecasts for 2008Q4 through 2009Q2\n",
    "forecasts_post = pd.concat([y_update, res_post.forecast('2009Q2')])\n",
    "print(forecasts_post)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, the forecast error is quite large - inflation was more than 10 percentage points below the AR(1) models' forecast. (This was largely because of large swings in oil prices around the global financial crisis)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To analyse this in more depth, we can use Statsmodels to isolate the effect of the new information - or the \"**news**\" - on our forecasts. This means that we do not yet want to change our model or re-estimate the parameters. Instead, we will use the `news` method that is available in the results objects of state space models.\n",
    "\n",
    "Computing the news in Statsmodels always requires a *previous* results object or dataset, and an *updated* results object or dataset. Here we will use the original results object `res_pre` as the previous results and the `res_post` results object that we just created as the updated results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we have previous and updated results objects or datasets, we can compute the news by calling the `news` method. Here, we will call `res_pre.news`, and the first argument will be the updated results, `res_post` (however, if you have two results objects, the `news` method could can be called on either one).\n",
    "\n",
    "In addition to specifying the comparison object or dataset as the first argument, there are a variety of other arguments that are accepted. The most important specify the \"impact periods\" that you want to consider. These \"impact periods\" correspond to the forecasted periods of interest; i.e. these dates specify with periods will have forecast revisions decomposed.\n",
    "\n",
    "To specify the impact periods, you must pass two of `start`, `end`, and `periods` (similar to the Pandas `date_range` method). If your time series was a Pandas object with an associated date or period index, then you can pass dates as values for `start` and `end`, as we do below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the impact of the news on the four periods that we previously\n",
    "# forecasted: 2008Q3 through 2009Q2\n",
    "news = res_pre.news(res_post, start='2008Q3', end='2009Q2')\n",
    "# Note: one alternative way to specify these impact dates is\n",
    "# `start='2008Q3', periods=4`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The variable `news` is an object of the class `NewsResults`, and it contains details about the updates to the data in `res_post` compared to `res_pre`, the new information in the updated dataset, and the impact that the new information had on the forecasts in the period between `start` and `end`.\n",
    "\n",
    "One easy way to summarize the results are with the `summary` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(news.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Summary output**: the default summary for this news results object printed four tables:\n",
    "\n",
    "1. Summary of the model and datasets\n",
    "2. Details of the news from updated data\n",
    "3. Summary of the impacts of the new information on the forecasts between `start='2008Q3'` and `end='2009Q2'`\n",
    "4. Details of how the updated data led to the impacts on the forecasts between `start='2008Q3'` and `end='2009Q2'`\n",
    "\n",
    "These are described in more detail below.\n",
    "\n",
    "*Notes*:\n",
    "\n",
    "- There are a number of arguments that can be passed to the `summary` method to control this output. Check the documentation / docstring for details.\n",
    "- Table (4), showing details of the updates and impacts, can become quite large if the model is multivariate, there are multiple updates, or a large number of impact dates are selected. It is only shown by default for univariate models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**First table: summary of the model and datasets**\n",
    "\n",
    "The first table, above, shows:\n",
    "\n",
    "- The type of model from which the forecasts were made. Here this is an ARIMA model, since an AR(1) is a special case of an ARIMA(p,d,q) model.\n",
    "- The date and time at which the analysis was computed.\n",
    "- The original sample period, which here corresponds to `y_pre`\n",
    "- The endpoint of the updated sample period, which here is the last date in `y_post`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Second table: the news from updated data**\n",
    "\n",
    "This table simply shows the forecasts from the previous results for observations that were updated in the updated sample.\n",
    "\n",
    "*Notes*:\n",
    "\n",
    "- Our updated dataset `y_post` did not contain any *revisions* to previously observed datapoints. If it had, there would be an additional table showing the previous and updated values of each such revision."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Third table: summary of the impacts of the new information**\n",
    "\n",
    "*Columns*:\n",
    "\n",
    "The third table, above, shows:\n",
    "\n",
    "- The previous forecast for each of the impact dates, in the \"estimate (prev)\" column\n",
    "- The impact that the new information (the \"news\") had on the forecasts for each of the impact dates, in the \"impact of news\" column\n",
    "- The updated forecast for each of the impact dates, in the \"estimate (new)\" column\n",
    "\n",
    "*Notes*:\n",
    "\n",
    "- In multivariate models, this table contains additional columns describing the relevant impacted variable for each row.\n",
    "- Our updated dataset `y_post` did not contain any *revisions* to previously observed datapoints. If it had, there would be additional columns in this table showing the impact of those revisions on the forecasts for the impact dates.\n",
    "- Note that `estimate (new) = estimate (prev) + impact of news`\n",
    "- This table can be accessed independently using the `summary_impacts` method.\n",
    "\n",
    "*In our example*:\n",
    "\n",
    "Notice that in our example, the table shows the values that we computed earlier:\n",
    "\n",
    "- The \"estimate (prev)\" column is identical to the forecasts from our previous model, contained in the `forecasts_pre` variable.\n",
    "- The \"estimate (new)\" column is identical to our `forecasts_post` variable, which contains the observed value for 2008Q3 and the forecasts from the updated model for 2008Q4 - 2009Q2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Fourth table: details of updates and their impacts**\n",
    "\n",
    "The fourth table, above, shows how each new observation translated into specific impacts at each impact date.\n",
    "\n",
    "*Columns*:\n",
    "\n",
    "The first three columns table described the relevant **update** (an \"updated\" is a new observation):\n",
    "\n",
    "- The first column (\"update date\") shows the date of the variable that was updated.\n",
    "- The second column (\"forecast (prev)\") shows the value that would have been forecasted for the update variable at the update date based on the previous results / dataset.\n",
    "- The third column (\"observed\") shows the actual observed value of that updated variable / update date in the updated results / dataset.\n",
    "\n",
    "The last four columns described the **impact** of a given update (an impact is a changed forecast within the \"impact periods\").\n",
    "\n",
    "- The fourth column (\"impact date\") gives the date at which the given update made an impact.\n",
    "- The fifth column (\"news\") shows the \"news\" associated with the given update (this is the same for each impact of a given update, but is just not sparsified by default)\n",
    "- The sixth column (\"weight\") describes the weight that the \"news\" from the given update has on the impacted variable at the impact date. In general, weights will be different between each \"updated variable\" / \"update date\" / \"impacted variable\" / \"impact date\" combination.\n",
    "- The seventh column (\"impact\") shows the impact that the given update had on the given \"impacted variable\" / \"impact date\".\n",
    "\n",
    "*Notes*:\n",
    "\n",
    "- In multivariate models, this table contains additional columns to show the relevant variable that was updated and variable that was impacted for each row. Here, there is only one variable (\"infl\"), so those columns are suppressed to save space.\n",
    "- By default, the updates in this table are \"sparsified\" with blanks, to avoid repeating the same values for \"update date\", \"forecast (prev)\", and \"observed\" for each row of the table. This behavior can be overridden using the `sparsify` argument.\n",
    "- Note that `impact = news * weight`.\n",
    "- This table can be accessed independently using the `summary_details` method.\n",
    "\n",
    "*In our example*:\n",
    "\n",
    "- For the update to 2008Q3 and impact date 2008Q3, the weight is equal to 1. This is because we only have one variable, and once we have incorporated the data for 2008Q3, there is no no remaining ambiguity about the \"forecast\" for this date. Thus all of the \"news\" about this variable at 2008Q3 passes through to the \"forecast\" directly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Addendum: manually computing the news, weights, and impacts\n",
    "\n",
    "For this simple example with a univariate model, it is straightforward to compute all of the values shown above by hand. First, recall the formula for forecasting $y_{T+h|T} = \\phi^h y_T$, and note that it follows that we also have $y_{T+h|T+1} = \\phi^h y_{T+1}$. Finally, note that $y_{T|T+1} = y_T$, because if we know the value of the observations through $T+1$, we know the value of $y_T$.\n",
    "\n",
    "**News**: The \"news\" is nothing more than the forecast error associated with one of the new observations. So the news associated with observation $T+1$ is:\n",
    "\n",
    "$$n_{T+1} = y_{T+1} - y_{T+1|T} = Y_{T+1} - \\phi Y_T$$\n",
    "\n",
    "**Impacts**: The impact of the news is the difference between the updated and previous forecasts, $i_h \\equiv y_{T+h|T+1} - y_{T+h|T}$.\n",
    "\n",
    "- The previous forecasts for $h=1, \\dots, 4$ are: $\\begin{pmatrix} \\phi y_T & \\phi^2 y_T & \\phi^3 y_T & \\phi^4 y_T \\end{pmatrix}'$. \n",
    "- The updated forecasts for $h=1, \\dots, 4$ are: $\\begin{pmatrix} y_{T+1} & \\phi y_{T+1} & \\phi^2 y_{T+1} & \\phi^3 y_{T+1} \\end{pmatrix}'$.\n",
    "\n",
    "The impacts are therefore:\n",
    "\n",
    "$$\\{ i_h \\}_{h=1}^4 = \\begin{pmatrix} y_{T+1} - \\phi y_T \\\\ \\phi (Y_{T+1} - \\phi y_T) \\\\ \\phi^2 (Y_{T+1} - \\phi y_T) \\\\ \\phi^3 (Y_{T+1} - \\phi y_T) \\end{pmatrix}$$\n",
    "\n",
    "**Weights**: To compute the weights, we just need to note that it is immediate that we can rewrite the impacts in terms of the forecast errors, $n_{T+1}$.\n",
    "\n",
    "$$\\{ i_h \\}_{h=1}^4 = \\begin{pmatrix} 1 \\\\ \\phi \\\\ \\phi^2 \\\\ \\phi^3 \\end{pmatrix} n_{T+1}$$\n",
    "\n",
    "The weights are then simply $w = \\begin{pmatrix} 1 \\\\ \\phi \\\\ \\phi^2 \\\\ \\phi^3 \\end{pmatrix}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check that this is what the `news` method has computed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the news, computed by the `news` method\n",
    "print(news.news)\n",
    "\n",
    "# Manually compute the news\n",
    "print()\n",
    "print((y_update.iloc[0] - phi_hat * y_pre.iloc[-1]).round(6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the total impacts, computed by the `news` method\n",
    "# (Note: news.total_impacts = news.revision_impacts + news.update_impacts, but\n",
    "# here there are no data revisions, so total and update impacts are the same)\n",
    "print(news.total_impacts)\n",
    "\n",
    "# Manually compute the impacts\n",
    "print()\n",
    "print(forecasts_post - forecasts_pre)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the weights, computed by the `news` method\n",
    "print(news.weights)\n",
    "\n",
    "# Manually compute the weights\n",
    "print()\n",
    "print(np.array([1, phi_hat, phi_hat**2, phi_hat**3]).round(6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multivariate example: dynamic factor\n",
    "\n",
    "In this example, we'll consider forecasting monthly core price inflation based on the Personal Consumption Expenditures (PCE) price index and the Consumer Price Index (CPI), using a Dynamic Factor model. Both of these measures track prices in the US economy and are based on similar source data, but they have a number of definitional differences. Nonetheless, they track each other relatively well, so modeling them jointly using a single dynamic factor seems reasonable.\n",
    "\n",
    "One reason that this kind of approach can be useful is that the CPI is released earlier in the month than the PCE. One the CPI is released, therefore, we can update our dynamic factor model with that additional datapoint, and obtain an improved forecast for that month's PCE release. A more involved version of this kind of analysis is available in Knotek and Zaman (2017)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start by downloading the core CPI and PCE price index data from [FRED](https://fred.stlouisfed.org/), converting them to annualized monthly inflation rates, removing two outliers, and de-meaning each series (the dynamic factor model does not "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas_datareader as pdr\n",
    "levels = pdr.get_data_fred(['PCEPILFE', 'CPILFESL'], start='1999', end='2019').to_period('M')\n",
    "infl = np.log(levels).diff().iloc[1:] * 1200\n",
    "infl.columns = ['PCE', 'CPI']\n",
    "\n",
    "# Remove two outliers and de-mean the series\n",
    "infl['PCE'].loc['2001-09':'2001-10'] = np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To show how this works, we'll imagine that it is April 14, 2017, which is the data of the March 2017 CPI release. So that we can show the effect of multiple updates at once, we'll assume that we haven't updated our data since the end of January, so that:\n",
    "\n",
    "- Our **previous dataset** will consist of all values for the PCE and CPI through January 2017\n",
    "- Our **updated dataset** will additionally incorporate the CPI for February and March 2017 and the PCE data for February 2017. But it will not yet the PCE (the March 2017 PCE price index was not released until May 1, 2017)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Previous dataset runs through 2017-02\n",
    "y_pre = infl.loc[:'2017-01'].copy()\n",
    "const_pre = np.ones(len(y_pre))\n",
    "print(y_pre.tail())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For the updated dataset, we'll just add in the\n",
    "# CPI value for 2017-03\n",
    "y_post = infl.loc[:'2017-03'].copy()\n",
    "y_post.loc['2017-03', 'PCE'] = np.nan\n",
    "const_post = np.ones(len(y_post))\n",
    "\n",
    "# Notice the missing value for PCE in 2017-03\n",
    "print(y_post.tail())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We chose this particular example because in March 2017, core CPI prices fell for the first time since 2010, and this information may be useful in forecast core PCE prices for that month. The graph below shows the CPI and PCE price data as it would have been observed on April 14th$^\\dagger$.\n",
    "\n",
    "-----\n",
    "\n",
    "$\\dagger$ This statement is not entirely true, because both the CPI and PCE price indexes can be revised to a certain extent after the fact. As a result, the series that we're pulling are not exactly like those observed on April 14, 2017. This could be fixed by pulling the archived data from [ALFRED](https://alfred.stlouisfed.org/) instead of [FRED](https://fred.stlouisfed.org/), but the data we have is good enough for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the updated dataset\n",
    "fig, ax = plt.subplots(figsize=(15, 3))\n",
    "y_post.plot(ax=ax)\n",
    "ax.hlines(0, '2009', '2017-06', linewidth=1.0)\n",
    "ax.set_xlim('2009', '2017-06');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To perform the exercise, we first construct and fit a `DynamicFactor` model. Specifically:\n",
    "\n",
    "- We are using a single dynamic factor (`k_factors=1`)\n",
    "- We are modeling the factor's dynamics with an AR(6) model (`factor_order=6`)\n",
    "- We have included a vector of ones as an exogenous variable (`exog=const_pre`), because the inflation series we are working with are not mean-zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mod_pre = sm.tsa.DynamicFactor(y_pre, exog=const_pre, k_factors=1, factor_order=6)\n",
    "res_pre = mod_pre.fit()\n",
    "print(res_pre.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the fitted model in hand, we now construct the news and impacts associated with observing the CPI for March 2017. The updated data is for February 2017 and part of March 2017, and we'll examining the impacts on both March and April.\n",
    "\n",
    "In the univariate example, we first created an updated results object, and then passed that to the `news` method. Here, we're creating the news by directly passing the updated dataset.\n",
    "\n",
    "Notice that:\n",
    "\n",
    "1. `y_post` contains the entire updated dataset (not just the new datapoints)\n",
    "2. We also had to pass an updated `exog` array. This array must cover **both**:\n",
    "    - The entire period associated with `y_post`\n",
    "    - Any additional datapoints after the end of `y_post` through the last impact date, specified by `end`\n",
    "\n",
    "   Here, `y_post` ends in March 2017, so we needed our `exog` to extend one more period, to April 2017."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the news results\n",
    "# Note\n",
    "const_post_plus1 = np.ones(len(y_post) + 1)\n",
    "news = res_pre.news(y_post, exog=const_post_plus1, start='2017-03', end='2017-04')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note**:\n",
    ">\n",
    "> In the univariate example, above, we first constructed a new results object, and then passed that to the `news` method. We could have done that here too, although there is an extra step required. Since we are requesting an impact for a period beyond the end of `y_post`, we would still need to pass the additional value for the `exog` variable during that period to `news`:\n",
    "> \n",
    "> ```python\n",
    "res_post = res_pre.apply(y_post, exog=const_post)\n",
    "news = res_pre.news(res_post, exog=[1.], start='2017-03', end='2017-04')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have computed the `news`, printing `summary` is a convenient way to see the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show the summary of the news results\n",
    "print(news.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because we have multiple variables, by default the summary only shows the news from updated data along and the total impacts.\n",
    "\n",
    "From the first table, we can see that our updated dataset contains three new data points, with most of the \"news\" from these data coming from the very low reading in March 2017.\n",
    "\n",
    "The second table shows that these three datapoints substantially impacted the estimate for PCE in March 2017 (which was not yet observed). This estimate revised down by nearly 1.5 percentage points.\n",
    "\n",
    "The updated data also impacted the forecasts in the first out-of-sample month, April 2017. After incorporating the new data, the model's forecasts for CPI and PCE inflation in that month revised down 0.29 and 0.17 percentage point, respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While these tables show the \"news\" and the total impacts, they do not show how much of each impact was caused by each updated datapoint. To see that information, we need to look at the details tables.\n",
    "\n",
    "One way to see the details tables is to pass `include_details=True` to the `summary` method. To avoid repeating the tables above, however, we'll just call the `summary_details` method directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(news.summary_details())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This table shows that most of the revisions to the estimate of PCE in April 2017, described above, came from the news associated with the CPI release in March 2017. By contrast, the CPI release in February had only a little effect on the April forecast, and the PCE release in February had essentially no effect."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bibliography\n",
    "\n",
    "Bańbura, Marta, Domenico Giannone, and Lucrezia Reichlin. \"Nowcasting.\" The Oxford Handbook of Economic Forecasting. July 8, 2011.\n",
    "\n",
    "Bańbura, Marta, Domenico Giannone, Michele Modugno, and Lucrezia Reichlin. \"Now-casting and the real-time data flow.\" In Handbook of economic forecasting, vol. 2, pp. 195-237. Elsevier, 2013.\n",
    "\n",
    "Bańbura, Marta, and Michele Modugno. \"Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data.\" Journal of Applied Econometrics 29, no. 1 (2014): 133-160.\n",
    "\n",
    "Knotek, Edward S., and Saeed Zaman. \"Nowcasting US headline and core inflation.\" Journal of Money, Credit and Banking 49, no. 5 (2017): 931-968."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
